Seamless Integration of Parallelism and Memory Hierarchy

نویسندگان

  • Carlo Fantozzi
  • Andrea Pietracaprina
  • Geppino Pucci
چکیده

We prove an analogue of Brent’s lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v0-processor configuration with v0 v and the same overall memory size. For a wide class of computations the simulation exhibits optimal O (v=v0) slowdown. The simulation strategy aims at translating communication locality into temporal locality. As an important special case (v0 = 1), our simulation can be employed to obtain efficient hierarchyconscious sequential algorithms from efficient fine-grained ones.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Increased Acetate Ester Production of Polyploid Industrial Brewer’s Yeast Strains via Precise and Seamless “Self-cloning” Integration Strategy

Background: Enhancing the industrial yeast strains ethyl acetate yield through a precise and seamless genetic manipulation strategy without any extraneous DNA sequences is an essential requisite and significant demand. Objectives: For increasing the ethyl acetate yield of industrial brewer’s yeast strain, all the ATF1 alleles were overexpressed t...

متن کامل

Streams: Emerging from a Shared Memory Model

To date OpenMP has been considered the work horse for data parallelism and more recently task level parallelism. The model has been one of shared memory working in parallel on arrays of a uniform nature, but many applications do not meet these often restrictive access patterns. With the development of accelerators on the one hand and moving beyond the node to the cluster on the other, OpenMP’s ...

متن کامل

Improving Multi-Application Concurrency Support Within the GPU Memory System

GPUs exploit a high degree of thread-level parallelism to efficiently hide long-latency stalls. Thanks to their latencyhiding abilities and continued improvements in programmability, GPUs are becoming a more essential computational resource. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-sca...

متن کامل

A survey of memory architecture for 3D chip multi-processors

3D chip multi-processors (3D CMPs) combine the advantages of 3D integration and the parallelism of CMPs, which are emerging as active research topics in VLSI and multi-core computer architecture communities. One significant potentiality of 3D CMPs is to exploit the diversity of integration processes and high volume of vertical TSV bandwidth to mitigate the well-known “Memory Wall” problem. Mean...

متن کامل

Flexible Parallel Processing in Memory: Architecture + Programming Model

VLSI technology continues to develop at a staggering rate presenting two challenges to computer designers: (i) how to capitalize on the additional resources that are available on a chip; and (ii) how to evolve computer architecture models that are well matched to the signi cantly changed physical parameters of new technology and the expanding needs of applications. One of the chief challenges i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002